Modeling biomarker variability in joint analysis of longitudinal and time-to-event data

Summary The role of visit-to-visit variability of a biomarker in predicting related disease has been recognized in medical science. Existing measures of biological variability are criticized for being entangled with random variability resulted from measurement error or being unreliable due to limited measurements per individual. In this article, we propose a new measure to quantify the biological variability of a biomarker by evaluating the fluctuation of each individual-specific trajectory behind longitudinal measurements. Given a mixed-effects model for longitudinal data with the mean function over time specified by cubic splines, our proposed variability measure can be mathematically expressed as a quadratic form of random effects. A Cox model is assumed for time-to-event data by incorporating the defined variability as well as the current level of the underlying longitudinal trajectory as covariates, which, together with the longitudinal model, constitutes the joint modeling framework in this article. Asymptotic properties of maximum likelihood estimators are established for the present joint model. Estimation is implemented via an Expectation-Maximization (EM) algorithm with fully exponential Laplace approximation used in E-step to reduce the computation burden due to the increase of the random effects dimension. Simulation studies are conducted to reveal the advantage of the proposed method over the two-stage method, as well as a simpler joint modeling approach which does not take into account biomarker variability. Finally, we apply our model to investigate the effect of systolic blood pressure variability on cardiovascular events in the Medical Research Council elderly trial, which is also the motivating example for this article.

and P (B ⊤ i B i is full rank) are positive.Moreover, if there exist a constant vector C 0 and a constant C0 such that with a positive probability, W ⊤ i C 0 = C0 , then C 0 = 0 and C0 = 0.
(i) Since exp(−x) ⩽ x −2 for any x > 0, we have ; which leads to Therefore, the jump size of Λ 0 (•) must be finite.Otherwise if Λ 0 {T i } → ∞ at some uncensored event time T i , then l m (θ, Λ 0 ) → −∞.On the other hand, θ belongs to a compact set Θ. Then the maximum likelihood estimate ( θ, Λ0 ) exists.
Simple algebraic operations lead to the following expression for m −1 l m ( θ, Λ 0 ): where , and According to condition (C.5), we have and where C j , j = 1, .., 6 are some positive constants.Let E Φ (•) denote the expectation with respect to standard multivariate normal distribution, due to the concavity of logarithm function and inequalities where C 7 , C 8 and C 9 are another three positive constants.Thus, the third term in (A.1) satisfies which, by the strong law of large numbers, can be bounded by some constant C 10 .Then (A.1) becomes where C 11 is a constant.
Additionally, using the inequality that exp(−x) ⩽ (1 + x/Γ) −Γ for any Γ > 0 and x > 0, we have Substituting (A.5) into (A.4) leads to the following expression In this way (A.6) becomes from which we obtain an upper bound for we have proved that with probability 1, Λ(τ ) is bounded for any sample size m.
with subscript i removed to denote the corresponding random variables.Differentiating l m (θ, Λ 0 ) with respect to the jump size Λ 0 {T i }, we obtain that Λ0 {T i } satisfies the equation To express concisely, we define G(b i , O i ; θ, Λ 0 ) as and further define Additionally, we introduce operator notations where P m is exactly the empirical measure from m i.i.d observations and P is the true probability measure which, in our model, is the abbreviation of P θ * ,Λ * 0 .Under notations defined above, (A.7) can be rewritten as and thus Λ0 {t} satisfies the following equation On the other hand, derivations similar to that in Tsiatis (1981) yield the expression for the true baseline hazard function To establish the connection between Λ0 {t} and Λ * 0 (t), we construct another function Λ0 (t) ∈ Z m with jump size given by Equivalently, The conclusion is that Λ0 (t) uniformly converges to Λ * 0 (t) in [0, τ ].To prove this, we focus on the difference between Λ0 (t) and Λ * 0 (t): are Glivenko-Cantelli classes, the two terms on the left-hand side of (A.8) converge to zero.Therefore, we arrive at the conclusion that Λ0 (t) uniformly converges to Λ * 0 (t).
Additionally, the definition of Λ0 (t) gives (A.9) (A.9) implies Λ0 (t) is absolutely continuous with respect to Λ0 (t) and the Radon-Nikodym derivative is given by On the other hand, the conclusion of (ii) implies there exists a subsequence of Λ0 which weakly converges to some right-continuous monotone function Λ # 0 .Further, we can choose a sub-subsequence of θ which converges to some θ # .What we need to do next is to show Λ # 0 = Λ * 0 and θ # = θ * .Our discussions below are restricted to the sub-subsequence satisfying Λ0 → Λ # 0 and θ → θ # .
On the other hand, (A.12) Since log[ G(b, O; θ, Λ0 )db/ G b, O; θ * , Λ0 db] belongs to a Glivenko-Canteli class and again using bound converge theorem, we have Similarly, Thus, taking limits on both side of (A.12) gives However, the left-hand side of the inequality above is exactly the negative Kullback-Leibler divergence.
Then it follows that, with probability one, Under assumptions specified in the beginning, our proposed joint model is identifiable and thus we conclude from (A.13) that θ # = θ * and Λ # 0 = Λ * 0 .
Then we have Under the assumptions specified in the beginning, we can get h 1 = 0 and h 2 = 0.
On the other hand, using derivation similar to that in (A.14), we can verify that the class is P-Donsker for some δ > 0 and that sup (h1,h2)∈H Then according to statement of Lemma 3.3.5 in Van Der Vaart and Wellner (1996), (A.17) and (A.18) combined with the consistency of ψm = ( θ, Λ0 ) yield .19) where G m = √ m (S m − S).Therefore, by the definition of ψn , we have where the second equality holds due to (A.19).Since S is Frèchet differentiable at ψ * , the left hand-side of (A.20) can be replaced by .

B.1 Expressions in M-step
The closed updating forms for ξ, σ 2 and D are given by and as defined in section 4.1.Let a ⊗2 = aa ⊤ for a column vector a, then the components of S ϕ1 and I ϕ1 can be finally expressed as follows, − m j=1 Y j (T i )E j(k) r(b j , T i , θ) b⊤ j K(t 0 , T i ) bj To mimic the MRC trial, simulation data in Case 2 is generated as follows: • Set sample size m = 3700 and t ij = (0, 0.041, 0.082, 0.166, 0.25, 0.5, 0.75, 1, ..., 5.75) with fixed difference 0.25 from 1 to 5.75.Baseline covariates in the longitudinal model are specified as x i = (x 1i , x 2i , x 3i ) ⊤ , where x 1i and x 2i are binary covariates (corresponding to sex and smoke indicator in MRC ) generated independently from Bernoulli distribution with probability 0.42 and 0.164 respectively; x 3i is generated from truncated normal distribution with mean 7, variance 0.29 and truncated interval [6.5, 7.4] (denoted as TN(7,0.29;[6.5,7.4]) in the following) to mimic the shape of ages (divided by 10) in the MRC trial.The external stimulus, z ij , is specified as a single covariate taking value 1 if t ij falls into the set {0, 1, 2, 3, 4, 5} and 0 otherwise.Cubic B-spline bases are constructed with two interior knots located at 0.25 and 1.5 (q=6).b i is of q − 1 dimension rather than q because the variance of b 1i is very close to 0 as suggested by the MRC trial data analysis.
• Baseline covariates w i = (w 1i , w 2i , x ⊤ i ) ⊤ in the survival model is of five dimension where (w 1i , w 2i , 1− w 1i −w 2i ), corresponding to the treatment indicator in the MRC trial, is generated from multinomial distribution with parameters n = 1 and p = (0.25, 0.25, 0.5).(w 1i , w 2i ) also makes an impact on the longitudinal process via interaction terms between spline bases and treatments in fixed-effects part.β (1) k and β (2) k denote the effects of interaction terms w 1i * B k (t) and w 2i * B k (t) respectively, for k = 1, ..., q.

(A. 6 )
where C 12 (Γ) is a deterministic function of Γ.By the strong law of large numbers, m i=1 I(T i = τ )/m a.s.→ P(T = τ ) > 0. Thus we can choose Γ large enough such that Γ m m i=1 j (T i )E j(k) [r(b j , T i , θ)]Case 2: Simulation study based on the MRC trial is the negative information matrix in the submodel θ * + εh 1 , Λ * 0 + * , Λ * 0